Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig #2474

jerryzh168 · 2025-07-02T01:58:39Z

Stacked PRs:

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig

Summary:
we will

deprecate FbgemmConfig since it's a single kernel (later).
we'd like to categorize things to derived dtype + packed format, e.g. int4 preshuffled, float8 plain
Added PackingFormat that has preshuffled, plain and _legacy for legacy implementation

Test Plan:
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-07-02T01:58:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2474

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 1 Cancelled Job, 2 Unrelated Failures

As of commit aba5d26 with merge base 975bd57 ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test (CPU 2.5.1, linux.4xlarge, torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu, cp... / linux-job (gh)
RuntimeError: Command docker exec -t 79f81cd4ff4418e77d4e6e09029cc59358b3e55454835d01c5d845f2008dabb9 /exec failed with exit code 1
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t c31e2728117daa74a78ec12838e51b16c412ef94fdbcc764ea8d76fd630f0f3f /exec failed with exit code 1
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 8619197a4696e6f5d4efc333525ed974bd00e38dd142f8ee2a6110b6b8cc8cf4 /exec failed with exit code 1
Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job (gh)
RuntimeError: Command docker exec -t e45f6a289382aa2ef4f975680ef25db31118f2457039f6f077d41b1b45f1023f /exec failed with exit code 1
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 94edcf63b540f905cdbf1f7909c00071bdc58ca8cd55309587cae7e228a9bc01 /exec failed with exit code 1
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t f64a1f2206436a1c4cccf3607f54b7262bffd9d45d592da58c8bd162cac5e1b3 /exec failed with exit code 1
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t a2e58833eb968a2fcba30de2c87102363c32e8e32d7aea0e7fdc7adc87d2991f /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

Run Float8 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch fbgemm-gpu-genai --extra-index... / linux-job (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Run Float8 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio fbgemm-gpu-genai --extra-index-url... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…micActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10

…amicActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10

…amicActivationInt4WeightConfig Summary: we will * deprecate FbgemmConfig since it's a single kernel (later). * we'd like to categorize things to derived dtype + packed format, e.g. int4 preshuffled, float8 plain * Added PackingFormat that has preshuffled, plain and _legacy for legacy implementation Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py python test/quantization/quantize_/workflows/float8/test_float8_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10

andrewor14

Looks great, left some comments mostly about documentation

torchao/quantization/quantize_/workflows/packing_format.py

andrewor14 · 2025-07-15T20:13:05Z

torchao/quantization/quant_api.py

@@ -1123,6 +1125,9 @@ class Int4WeightOnlyConfig(AOBaseConfig):
    zero_point_domain: Optional[ZeroPointDomain] = ZeroPointDomain.NONE
    set_inductor_config: bool = True
    preserve_zero: Optional[bool] = None
+    # since not all tensors are migrated to the new structure yet,
+    # we use `_legacy' to represent the previous layout
+    packing_format: PackingFormat = "_legacy"


Can "legacy" mean different things for different configs? I wonder if we should make this optional instead, where None represents "legacy"?

yeah legacy just means no packing format, (it's implemented with AQT), I plan to remove the support for legacy at some point and don't want to complicate the typing here

torchao/quantization/quantize_/workflows/int4/int4_tensor.py

torchao/quantization/quant_api.py

torchao/quantization/quantize_/workflows/packing_format.py

…amicActivationInt4WeightConfig Summary: we will * deprecate FbgemmConfig since it's a single kernel (later). * we'd like to categorize things to derived dtype + packed format, e.g. int4 preshuffled, float8 plain * Added PackingFormat that has preshuffled, plain and _legacy for legacy implementation Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py python test/quantization/quantize_/workflows/float8/test_float8_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10

jerryzh168 force-pushed the jerryzh168/stack/10 branch from a3d0835 to 4b0c7c7 Compare July 2, 2025 01:58

This was referenced Jul 2, 2025

Add support for Int4GroupwisePreshuffleTensor for fbgemm #2421

Merged

Remove transpose_input from fbgemm configs #2422

Merged

Add support for float8 activation for Int4PreshuffledTensor #2437

Merged

Add Float8Tensor #2463

Open

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2025

jerryzh168 added the topic: new feature Use this tag if this PR adds a new feature label Jul 2, 2025

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 2, 2025 20:35

jerryzh168 force-pushed the jerryzh168/stack/10 branch from 4b0c7c7 to f5977ce Compare July 2, 2025 20:36

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 2, 2025 20:36

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 2, 2025 21:42

jerryzh168 force-pushed the jerryzh168/stack/10 branch 2 times, most recently from 04ce2c5 to afd8703 Compare July 2, 2025 21:42

jerryzh168 mentioned this pull request Jul 2, 2025

Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor #2479

Merged

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 2, 2025 21:42

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 2, 2025 23:44

jerryzh168 force-pushed the jerryzh168/stack/10 branch from afd8703 to ff4682e Compare July 2, 2025 23:44

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 2, 2025 23:44

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 3, 2025 00:09

jerryzh168 force-pushed the jerryzh168/stack/10 branch from ff4682e to 58f8a2a Compare July 3, 2025 00:09

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 3, 2025 00:09

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 3, 2025 02:18

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 15, 2025 00:27

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 15, 2025 00:53

jerryzh168 force-pushed the jerryzh168/stack/10 branch from 1bdd532 to 664dd5c Compare July 15, 2025 00:53

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 15, 2025 00:53

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 15, 2025 01:44

jerryzh168 force-pushed the jerryzh168/stack/10 branch from 664dd5c to 037c964 Compare July 15, 2025 01:44

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 15, 2025 01:44

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 15, 2025 03:42

jerryzh168 force-pushed the jerryzh168/stack/10 branch from 037c964 to 35974be Compare July 15, 2025 03:42

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 15, 2025 03:42

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 15, 2025 18:15

jerryzh168 force-pushed the jerryzh168/stack/10 branch from 35974be to e7b03a9 Compare July 15, 2025 18:15

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 15, 2025 18:15

andrewor14 approved these changes Jul 15, 2025

View reviewed changes

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 15, 2025 21:11

jerryzh168 force-pushed the jerryzh168/stack/10 branch from e7b03a9 to b719048 Compare July 15, 2025 21:11

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 15, 2025 21:11

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 15, 2025 23:12

jerryzh168 force-pushed the jerryzh168/stack/10 branch from b719048 to 626c82d Compare July 15, 2025 23:12

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 15, 2025 23:12

jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 16, 2025 00:18

jerryzh168 force-pushed the jerryzh168/stack/10 branch from 626c82d to aba5d26 Compare July 16, 2025 00:18

jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 16, 2025 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig #2474

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig #2474

jerryzh168 commented Jul 2, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 2, 2025 •

edited

Loading

Uh oh!

andrewor14 left a comment

Uh oh!

Uh oh!

andrewor14 Jul 15, 2025

Uh oh!

jerryzh168 Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig #2474

Are you sure you want to change the base?

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig #2474

Conversation

jerryzh168 commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!